frequency bin
- Asia > Taiwan (0.05)
- Europe > Portugal > Aveiro > Aveiro (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- (7 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.14)
- North America > United States > Michigan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology (0.89)
Emulating Radiative Transfer in Astrophysical Environments
Rost, Rune, Branca, Lorenzo, Buck, Tobias
Radiative transfer is a fundamental process in astrophysics, essential for both interpreting observations and modeling thermal and dynamical feedback in simulations via ionizing radiation and photon pressure. However, numerically solving the underlying radiative transfer equation is computationally intensive due to the complex interaction of light with matter and the disparity between the speed of light and the typical gas velocities in astrophysical environments, making it particularly expensive to include the effects of on-the-fly radiation in hydrodynamic simulations. This motivates the development of surrogate models that can significantly accelerate radiative transfer calculations while preserving high accuracy. We present a surrogate model based on a Fourier Neural Operator architecture combined with U-Nets. Our model approximates three-dimensional, monochromatic radiative transfer in time-dependent regimes, in absorption-emission approximation, achieving speedups of more than 2 orders of magnitude while maintaining an average relative error below 3%, demonstrating our approach's potential to be integrated into state-of-the-art hydrodynamic simulations.
- Asia > Taiwan (0.05)
- Europe > Portugal > Aveiro > Aveiro (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- (7 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.14)
- North America > United States > Michigan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology (0.89)
LongTail-Swap: benchmarking language models' abilities on rare words
Algayres, Robin, Saint-James, Charles-Éric, Luthra, Mahi, Shen, Jiayi, Lin, Dongyan, Benchekroun, Youssef, Moritz, Rashel, Pino, Juan, Dupoux, Emmanuel
Children learn to speak with a low amount of data and can be taught new words on a few-shot basis, making them particularly data-efficient learners. The BabyLM challenge aims at exploring language model (LM) training in the low-data regime but uses metrics that concentrate on the head of the word distribution. Here, we introduce LongTail-Swap (LT-Swap), a benchmark that focuses on the tail of the distribution, i.e., measures the ability of LMs to learn new words with very little exposure, like infants do. LT-Swap is a pretraining corpus-specific test set of acceptable versus unacceptable sentence pairs that isolate semantic and syntactic usage of rare words. Models are evaluated in a zero-shot fashion by computing the average log probabilities over the two members of each pair. We built two such test sets associated with the 10M words and 100M words BabyLM training sets, respectively, and evaluated 16 models from the BabyLM leaderboard. Our results not only highlight the poor performance of language models on rare words but also reveal that performance differences across LM architectures are much more pronounced in the long tail than in the head. This offers new insights into which architectures are better at handling rare word generalization. We've also made the code publicly avail
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Slovenia (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
Choi, Woongjib, Lee, Sangmin, Lim, Hyungseob, Kang, Hong-Goo
In this paper, we present a vocoder-free framework for audio super-resolution that employs a flow matching generative model to capture the conditional distribution of complex-valued spectral coefficients. Unlike conventional two-stage diffusion-based approaches that predict a mel-spectrogram and then rely on a pre-trained neural vocoder to synthesize waveforms, our method directly reconstructs waveforms via the inverse Short-Time Fourier Transform (iSTFT), thereby eliminating the dependence on a separate vocoder. This design not only simplifies end-to-end optimization but also overcomes a critical bottleneck of two-stage pipelines, where the final audio quality is fundamentally constrained by vocoder performance. Experiments show that our model consistently produces high-fidelity 48 kHz audio across diverse upsampling factors, achieving state-of-the-art performance on both speech and general audio datasets.
- Media > Music (0.46)
- Leisure & Entertainment (0.46)
TF-MLPNet: Tiny Real-Time Neural Speech Separation
Itani, Malek, Chen, Tuochao, Gollakota, Shyamnath
Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.
Simulation-based population inference of LISA's Galactic binaries: Bypassing the global fit
Srinivasan, Rahul, Barausse, Enrico, Korsakova, Natalia, Trotta, Roberto
The Laser Interferometer Space Antenna (LISA) is expected to detect thousands of individually resolved gravitational wave sources, overlapping in time and frequency, on top of unresolved astrophysical and/or primordial backgrounds. Disentangling resolved sources from backgrounds and extracting their parameters in a computationally intensive "global fit" is normally regarded as a necessary step toward reconstructing the properties of the underlying astrophysical populations. Here, we show that it is possible to infer the properties of the most numerous population of LISA sources - Galactic double white dwarfs - directly from the frequency (or, equivalently, time) strain series, by using a simulation-based approach that bypasses the global fit entirely. By training a normalizing flow on a custom-designed compression of simulated LISA frequency series from the Galactic double white dwarf population, we demonstrate how to infer the posterior distribution of population parameters (e.g., mass function, frequency, and spatial distributions). This allows for extracting information on the population parameters from both resolved and unresolved sources simultaneously and in a computationally efficient manner. Our approach to target population properties directly can be readily extended to other source classes (e.g., massive and stellar-mass black holes, extreme mass ratio inspirals), provided fast simulations are available, and to scenarios involving non-Gaussian or non-stationary noise (e.g., data gaps).
- North America > United States (0.28)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (6 more...)
An Efficient GPU-based Implementation for Noise Robust Sound Source Localization
Lin, Zirui, Takigahira, Masayuki, Terakado, Naoya, Gulzar, Haris, Busto, Monikka Roslianna, Eda, Takeharu, Itoyama, Katsutoshi, Nakadai, Kazuhiro, Amano, Hideharu
Dept. of Information and Computer Science, Keio University, Kanagawa, Japan Email: hunga@am.ics.keio.ac.jp Abstract --Robot audition, encompassing Sound Source Localization (SSL), Sound Source Separation (SSS), and Automatic Speech Recognition (ASR), enables robots and smart devices to acquire auditory capabilities similar to human hearing. Despite their wide applicability, processing multi-channel audio signals from microphone arrays in SSL involves computationally intensive matrix operations, which can hinder efficient deployment on Central Processing Units (CPUs), particularly in embedded systems with limited CPU resources. This paper introduces a GPU-based implementation of SSL for robot audition, utilizing the Generalized Singular V alue Decomposition-based Multiple Signal Classification (GSVD-MUSIC), a noise-robust algorithm, within the HARK platform, an open-source software suite. For a 60-channel microphone array, the proposed implementation achieves significant performance improvements. On the Jet-son AGX Orin, an embedded device powered by an NVIDIA GPU and ARM Cortex -A78AE v8.2 64-bit CPUs, we observe speedups of 5648.7 for GSVD calculations and 10.7 for the SSL module, while speedups of 4245.1 for GSVD calculation and 17.3 for the entire SSL module on a server configured with an NVIDIA A100 GPU and AMD EPYC 7352 CPUs, making real-time processing feasible for large-scale microphone arrays and providing ample capacity for real-time processing of potential subsequent machine learning or deep leraning tasks. I NTRODUCTION Audition is a critical aspect of human inter-individual communication [1].
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.24)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (7 more...)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)